ARM
ARM¶
- Reduced Instruction Set Computing (RISC)
- Less than 100 Instructions
- Instructions only operate on Registers
- ONLY Load/Store instructions can access memory.
- Instructions can be used for Continual Execution
- ARMv3 and earlier use little-endian format for data
- ARMv4 and later use Big-endian format by default but allows for switchable endian-ness for data
- Uses little-endian format for Instructions
ARM Family | ARM Architecture |
---|---|
ARM7 | ARM v4 |
ARM9 | ARM v5 |
ARM11 | ARM v6 |
Cortex-A | ARM v7-A |
Cortex-R | ARM v7-R |
Cortex-M | ARM v7-M |
ARM Mode:
- R15 Program Counter is always 4 bytes
Writing Assembly¶
Use as
to transform ASM file to object file
Use ld
to link object files to binary
as program.s -o program.o
ld program.o -o program
.string
is null terminated
.ascii
in not null terminated
Instructions¶
Instruction | Description |
---|---|
MOV | Move data |
EOR | Bitwise XOR |
MVN | Move and negate |
LDR | Load |
ADD | Addition |
STR | Store |
SUB | Subtraction |
LDM | Load Multiple |
MUL | Multiplication |
STM | Store Multiple |
LSL | Logical Shift Left |
PUSH | Push on Stack |
LSR | Logical Shift Right |
POP | Pop off Stack |
ASR | Arithmetic Shift Right |
B | Branch |
ROR | Rotate Right |
BL | Branch with Link |
CMP | Compare |
BX | Branch and eXchange |
AND | Bitwise AND |
BLX | Branch with Link and eXchange |
ORR | Bitwise OR |
SWI/SVC | System Call |
Barrel Shifter can be used to shrink multiple instructions into one.
Rx, ASR n
: Register x with arithmetic shift right by n bits (1 = n = 32)Rx, LSL n
: Register x with logical shift left by n bits (0 = n = 31)Rx, LSR n
: Register x with logical shift right by n bits (1 = n = 32)Rx, ROR n
: Register x with rotate right by n bits (1 = n = 31)Rx, RRX
: Register x with rotate right by one bit, with extend
Examples:
ADD R0, R1, R2 // R1 + R2 -> R0
ADD R0, R1, #2 // R1 + 2 -> R0
LDR R2, [R0] // Use the address in R0 and load the data at the address into R2.
LDR R1, [PC, #12] // Use the address in PC where the offset of the address is 12 and load the data at the address into R1.
STR R2, [R1] // Store the value of R2 in to the address denoted by R1
STR r2, [r1, #4]! // R1 + 4 -> R1
// Store the varable in R2 in the new address in R1 with the offset of the address is 4.
LDR r3, [r1], #4 // Load the value at memory address found in R1 to register R3.
// R1 + 4 -> R1
STR r2, [r1, r2, LSL#2] // Store the value in R2 to the memory address in R1 with the offset R2 left-shifted by 2.
STR r2, [r1, r2, LSL#2]! // R1 + R2<<2 -> R1
// Store the value in R2 to the new memory address found in R1.
LDR r3, [r1], r2, LSL#2 // Load value at memory address found in R1 to the register R3.
// R1 + R2<<2 -> R1
MOVLE R0, #5 // If LE (Less Than or Equal) is set 5 -> R0
MOV R0, R1, LSL #1 // Store left shifted R1 -> R0
adr r0, words+12 /* address of words[3] -> r0 */
ldr r1, array_buff_bridge /* address of array_buff[0] -> r1 */
ldr r2, array_buff_bridge+4 /* address of array_buff[2] -> r2 */
ldm r0, {r4,r5} /* words[3] -> r4 = 0x03; words[4] -> r5 = 0x04 */
stm r1, {r4,r5} /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04 */
ldmia r0, {r4-r6} /* words[3] -> r4 = 0x03, words[4] -> r5 = 0x04; words[5] -> r6 = 0x05; */
stmia r1, {r4-r6} /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04; r6 -> array_buff[2] = 0x05 */
ldmib r0, {r4-r6} /* words[4] -> r4 = 0x04; words[5] -> r5 = 0x05; words[6] -> r6 = 0x06 */
stmib r1, {r4-r6} /* r4 -> array_buff[1] = 0x04; r5 -> array_buff[2] = 0x05; r6 -> array_buff[3] = 0x06 */
ldmda r0, {r4-r6} /* words[3] -> r6 = 0x03; words[2] -> r5 = 0x02; words[1] -> r4 = 0x01 */
ldmdb r0, {r4-r6} /* words[2] -> r6 = 0x02; words[1] -> r5 = 0x01; words[0] -> r4 = 0x00 */
stmda r2, {r4-r6} /* r6 -> array_buff[2] = 0x02; r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00 */
stmdb r2, {r4-r5} /* r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00; */
push {r0, r1}
pop {r2, r3}
stmdb sp!, {r0, r1}
ldmia sp!, {r4, r5}
Intermediate Values in ARM¶
Using any Intermediate value in arm can only be represented in 8bits with a bit shift throughout the 32bit.
MOV R0, #255 //Valid b1111111 << 0
MOV R0, #960 //Valid (0x3C0) = 0b00001111 << 6 = 0b1111000000
MOV R0, #961 //Invalid (0x3C1) = 0b1111000001
Data Types¶
- Signed data: Smaller Range of Numbers but can have negative
- Unsigned data: Large Range including zero
ldr
: Load Wordldrh
: Load unsigned Half Wordldrsh
: Load signed Half Wordldrb
: Load unsigned Byteldrsb
: Load signed Bytes
str
: Store Wordstrh
: Store unsigned Half Wordstrsh
: Store signed Half Wordstrb
: Store unsigned Bytestrsb
: Store signed Byte
Registers¶
- 30 General Purpose 32-bit Registers
- First 16 (R0-R15 General Purpose Registers) are accessible in User-Level Mode
- R7 (Holds Syscall Number)
- R11 (Base Frame Pointer) Points to the bottom of the stack
- R12 (Intra Procedural Call)
- R13 (Stack Pointer) Controls the Pointer to the top of the stack where the top element of the stack is.
- R14 (Link Register) Used to store the Return address
- R15 (Program Counter)
- When a Branch/Jump is executed holds the destination address
- Otherwise holds two arm instructions after the Current instruction (Older Arm processors fetched instructions two ahead and is kept to insure compatibility)
- Control Program Status Register (CPSR)
- Bit 0-4: (Processor/Privilege Mode)
- Bit 5: (Thumb) 1 when in Thumb
- Bit 6: (FIQ disable)
- Bit 7: (IRQ disable)
- Bit 8: (Abort disable)
- Bit 9: (Endian-ness) 0 for little-endian 1 for big-endian
- Bit 10-15: ???
- Bit 16-19: ???
- Bit 24: (Jazelle bit) Allows some ARM processors to execute Java bytecode in hardware.
- Bit 25-26: ???
- Bit 27: (Underflow)
- Bit 28: (Overflow) Set when the result of an add, subtract, or compare is greater than or equal to 231, or less than 2^31.
- Bit 29: (Carry)
- Set when result of an addition is greater than or equal to 2^32
- Set when result of a subtraction is positive or zero
- Set when an inline barrel shifter operation in a move or logical instruction.
- Bit 30: (Zero) 1 when result is zero
- Bit 31: (Negative) 1 when result is negative
Example:
mov r0, #2
mov r1, #4
cmp r1, r0 // 4-2 Carry flag is set
cmp r0, r1 // 2-4 Negative flag is set
Conditionals¶
These conditionals below can be added to the end of any ARM instruction and will only execute when the flag is in the correct state.
Condition Code | Meaning (for cmp or subs) | Status of Flags | ||
---|---|---|---|---|
EQ | Equal | Z==1 | ||
NE | Not Equal | Z==0 | ||
GT | Signed Greater Than | (Z==0) && (N==V) | ||
LT | Signed Less Than | N!=V | ||
GE | Signed Greater Than or Equal | N==V | ||
LE | Signed Less Than or Equal | (Z==1) \ | \ | (N!=V) |
CS or HS | Unsigned Higher or Same (or Carry Set) | C==1 | ||
CC or LO | Unsigned Lower (or Carry Clear) | C==0 | ||
MI | Negative (or Minus) | N==1 | ||
PL | Positive (or Plus) | N==0 | ||
AL | Always executed | – | ||
NV | Never executed | – | ||
VS | Signed Overflow | V==1 | ||
VC | No signed Overflow | V==0 | ||
HI | Unsigned Higher | (C==1) && (Z==0) | ||
LS | Unsigned Lower or same | (C==0) \ | \ | (Z==0) |
Example:
.global main
main:
mov r0, #2 # r0 = 2
cmp r0, #3 # r0 == 3 If false set Negative bit
addlt r0, r0, #1 # If the less than bit is set then r0 = r0 + 1
cmp r0, #3 # r0 == 3 If false set Zero bit and reset Negative bit
addlt r0, r0, #1 # If the less than bit is set then r0 = r0 + 1
bx lr # Branch to the lr register
IF-THEN-(Else) Conditional Instruction¶
This is a simple switch instruction for assembly
IT
: refers to If-Then (If TRUE then execute the next instruction)ITT
: refers to If-Then-Then (If TRUE then execute the next 2 instructions)ITE
: refers to If-Then-Else (If TRUE then execute the next instruction, If FALSE skip the next instruction and execute the one after that)ITTE
: refers to If-Then-Then-Else (If TRUE then execute the next 2 instructions and skip the next one, If FALSE skip 2 instructions and execute the one after that)ITTEE
: refers to If-Then-Then-Else-Else (If TRUE then execute the next 2 instructions and skip the next 2 instructions after that, If FALSE skip 2 instructions and execute the two after that)
Example:
ITTE NE ; Next 3 instructions are conditional
ANDNE R0, R0, R1 ; ANDNE does not update condition flags
ADDSNE R2, R2, #1 ; ADDSNE updates condition flags
MOVEQ R2, R3 ; Conditional move Where EQ is the Inverse of NE
ITE GT ; Next 2 instructions are conditional
ADDGT R1, R0, #55 ; Conditional addition in case the GT is true
ADDLE R1, R0, #48 ; Conditional addition in case the GT is not true
ITTEE EQ ; Next 4 instructions are conditional
MOVEQ R0, R1 ; Conditional MOV
ADDEQ R2, R2, #10 ; Conditional ADD
ANDNE R3, R3, #1 ; Conditional AND
BNE.W dloop ; Branch instruction can only be used in the last instruction of an IT block
Branching¶
Branch (B): Simple jump to a function
Branch link (BL): Saves the program counter (PC+4) in LR register and jumps to function
Branch exchange (BX): Simple jump to a function but switch instruction set (ARM <-> Thumb)
Branch link exchange (BLX): Saves the program counter (PC+4) in specified register and jumps to function
Switch THUMB Mode:
.text
.global _start
_start:
.code 32 @ ARM mode
add r2, pc, #1 @ put PC+1 into R2
bx r2 @ branch + exchange to R2
.code 16 @ Thumb mode
mov r0, #1
Conditional Branch Example:
.text
.global _start
_start:
mov r0, #2 # r0 = 2
mov r1, #2 # r1 = 2
add r0, r0, r1 # r0 = r0 + r1
cmp r0, #4 # if r0 = 4
beq func1 # if r0 = 4 jump to func1
add r1, #5 # Else r1 = r1 + 5
b func2 # jump to func2
func1:
mov r1, r0 # r1 = r0
bx lr # jump to the address in lr
func2:
mov r0, r1 # r0 = r1
bx lr # jump to the address in lr
Stack¶
Stack can be Grow up or down.
If the stack grows up it is a descending Stack.
If the stack grows down it is a ascending Stack.
If the stack points to an object then its a full stack
If the stack points to an null before the stack starts then its an empty stack.
Stack Type | Store Instruction | Load Instruction |
---|---|---|
Full descending | STMFD (STMDB, Decrement Before) | LDMFD (LDM, Increment after) |
Full ascending | STMFA (STMIB, Increment Before) | LDMFA (LDMDA, Decrement After) |
Empty descending | STMED (STMDA, Decrement After) | LDMED (LDMIB, Increment Before) |
Empty ascending | STMEA (STM, Increment after) | LDMEA (LDMDB, Decrement Before) |
Thumb Mode¶
Thumb-1:
- 16 bit Instructions
- R15 Program Counter is always 2 bytes
- Used in ARMv6 and earlier
Thumb-2:
- Extends Thumb-1
- 16 bit or 32 bit Instructions
- 32bit instructions have a .w
added to the instruction
- Used in ARMv6T2, ARMv7
- R15 Program Counter is always 2 bytes
- Conditional Execution using the IT instruction
ThumbEE:
- code compiled on the device either shortly before or during execution.
Switching state¶
Switching to Thumb mode:
1. Use the BX (Branch Exchange) or the BLX (Branch Link and Exchange) and set the least significant bit destination register to 1.
- This does not cause alignment issues because the processor will ignore the last bit.
2. We know that we are in Thumb mode if the T bit in the current program status register is set.
Emulating ARM with Unicorn¶
from __future__ import print_function
from ctypes import sizeof
from unicorn import *
from unicorn.arm_const import *
from unicorn.unicorn_const import *
from capstone import *
import struct, binascii
#callback of the code hook
def hook_code(uc, addr, size, user_data):
mem = uc.mem_read(addr, size)
disas_single(bytes(mem),addr)
#disassembly each instruction and print the mnemonic name
def disas_single(data,addr):
for i in capmd.disasm(data,addr):
print(f"0x{i.address:x}:\t{i.mnemonic}\t{i.op_str}" % ())
break
next_free_block = 0x0
def map_memory(unicorn_obj, map_data, align_size=(1024 * 1024), default_perm=UC_PROT_ALL ):
for memory_loc, data_info in map_data.items():
#Set size if not set
if data_info.get('size') == None:
data_info["size"] = ((len(data_info["data"]) // align_size) + 1 ) * align_size
#Set Permissions if not set
if data_info.get('permissions') == None:
data_info["permissions"] = default_perm
#Check Memory map location
if memory_loc < next_free_block:
memory_loc = next_free_block
#Map the memory to the unicorn obj
unicorn_obj.mem_map(memory_loc, data_info["size"], perms=data_info["permissions"])
#Write the memory
unicorn_obj.mem_write(ADDRESS, data_info["data"])
#Update the next possible write location
next_free_block = memory_loc + data_info["size"]
def get_address(map_data, tag_name):
for memory_loc, data_info in map_data.items():
if data_info["tag"] == tag_name:
return memory_loc
#create a new instance of capstone
capmd = Cs(UC_ARCH_ARM, UC_MODE_ARM)
#code to be emulated
in_file = open("u-boot.bin", "rb") # opening for [r]eading as [b]inary
ARM_CODE32 = in_file.read()
in_file.close()
# file to be decrypted
in_file = open("kernel.img.raw", "rb") # opening for [r]eading as [b]inary
FILE_TOBE_DEC = in_file.read()
in_file.close()
print("Emulate ARM code")
print("Shielder")
try:
# Initialize emulator in ARM-32bit mode
# with "ARM" ARM instruction set
mu = Uc(UC_ARCH_ARM, UC_MODE_ARM)
#Map Memory from Dictionary
#Uboot | Stack | RAM
mem_map = { 0x80800000: {"tag": "uboot", "data": ARM_CODE32},
0x00000000: {"tag": "stack", "data": b"\x00" * (2 * 1024 * 1024)},
0x00000000: {"tag": "ram", "data": b"\x00" * (8 * 1024 * 1024)}}
map_memory(mu, mem_map)
# initialize machine registries
mu.reg_write(UC_ARM_REG_SP, get_address(mem_map, "stack"))
# first argument, memory pointer to the location of the file
mu.reg_write(UC_ARM_REG_R0, get_address(mem_map, "ram"))
# second argument, memory pointer to the location on which write the file
mu.reg_write(UC_ARM_REG_R1, get_address(mem_map, "ram"))
# third argument, block size to be read from memory pointed by r0
mu.reg_write(UC_ARM_REG_R2, 512)
# hook any instruction and disassembly them with capstone
mu.hook_add(UC_HOOK_CODE, hook_code)
# emulate code in infinite time
# Address + start/end of the block_aes_decrypt function
# this trick save much headaches
mu.emu_start(get_address(mem_map, "uboot")+0x8c40, get_address(mem_map, "uboot")+0x8c44)
# now print out some registers
print("Emulation done. Below is the CPU context")
r_r0 = mu.reg_read(UC_ARM_REG_R0)
r_r1 = mu.reg_read(UC_ARM_REG_R1)
r_r2 = mu.reg_read(UC_ARM_REG_R2)
r_pc = mu.reg_read(UC_ARM_REG_PC)
print(f">>> r0 = 0x{r_r0:x}")
print(f">>> r1 = 0x{r_r1:x}")
print(f">>> r2 = 0x{r_r2:x}")
print(f">>> pc = 0x{r_pc:x}")
print("\nReading data from first 512byte of the RAM at: " + hex(get_address(mem_map, "ram")))
print("==== BEGIN ====")
ram_data = mu.mem_read(get_address(mem_map, "ram"), 512)
print(str(binascii.hexlify(ram_data)))
print("==== END ====")
# from the reversed binary, we know which are the magic bytes
# at the beginning of the kernel
if b"27051956" == binascii.hexlify(bytearray(ram_data[:4])):
print("\nMagic Bytes match :)\n\n")
with open("test.bin", "wb") as f:
f.write(ram_data)
except UcError as e:
print("ERROR: %s" % e)